COP: Planning Conflicts for Faster Parallel Transactional Machine Learning

نویسندگان

  • Faisal Nawab
  • Divyakant Agrawal
  • Amr El Abbadi
  • Sanjay Chawla
چکیده

Machine learning techniques are essential to extracting knowledge from data. The volume of data encourages the use of parallelization techniques to extract knowledge faster. However, schemes to parallelize machine learning tasks face the trade-off between obeying strict consistency constraints and performance. Existing consistency schemes require expensive coordination between worker threads to detect conflicts, leading to poor performance. In this work, we consider the problem of improving the performance of multi-core machine learning while preserving strong consistency guarantees. We propose Conflict Order Planning (COP), a consistency scheme that exploits special properties of machine learning workloads to reduce the overhead of coordination. What is special about machine learning workloads is that the dataset is often known prior to the execution of the machine learning algorithm and is reused multiple times with different settings. We exploit this prior knowledge of the dataset to plan a partial order for concurrent execution. This planning reduces the cost of consistency significantly because it allows the use of a light-weight conflict detection operation that we call ReadWait. We demonstrate the use of COP on a Stochastic Gradient Descent algorithm for Support Vector Machines and observe better scalability and a speedup factor between 2-6x when compared to other consistency schemes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving HTM Scaling with Consistency-Oblivious Programming

We implemented two data structures in a consistency-oblivious programming (COP) style: a red black tree and a dynamic cacheoblivious B-tree. Unlike a naive transactional style, in which an operation such as an insertion is enclosed in a hardware transaction, in a COP-style there are two phases: an oblivious phase that runs with no transactions or locking, and an atomic phase that simply verifie...

متن کامل

Practical and Lock-free Parallel Nesting for Software Transactional Memory

Transactional Memory (TM) provides a strong abstraction to tackle the challenge of synchronizing concurrent tasks that access shared state. Yet, at the same time, TM inhibits the programmer from fully exploring the latent parallelism in his application. In particular, it does not allow a transaction to contain parallel code. This fact limits the expressiveness of TM as a synchronization mechani...

متن کامل

Two-stage fuzzy-stochastic programming for parallel machine scheduling problem with machine deterioration and operator learning effect

This paper deals with the determination of machine numbers and production schedules in manufacturing environments. In this line, a two-stage fuzzy stochastic programming model is discussed with fuzzy processing times where both deterioration and learning effects are evaluated simultaneously. The first stage focuses on the type and number of machines in order to minimize the total costs associat...

متن کامل

Parallel nesting in a lock-free multi-version Software Transactional Memory

Many applications contain large operations that must be performed atomically, which typically leads to many conflicts in optimistic concurrency control mechanisms such as those used by most Transactional Memory (TM) systems. Yet, sometimes these operations could be executed faster if their latent parallelism was used efficiently, but unfortunately few TM systems allow a transaction to be split ...

متن کامل

Implementation tradeoffs in the design of flexible transactional memory support

We present FlexTM (FLEXible Transactional Memory), a high performance TM framework that allows software to determine when (eagerly, lazily, or in a mixed fashion) and how to manage conflicts, while employing hardware to manage transactional state and to track conflicts. FlexTM coordinates four decoupled hardware mechanisms: read and write signatures, which summarize per-thread access sets; per-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017